Batch Load
Introduction
GigaSpaces now has the ability for Smart DIH to define batch loads via a standard pipeline interface. Batch load can now be performed without the use of IIDR
.
-
Full batch load can now be performed as a "pull" functionality without requiring the use of "push" functionality used by a typical IIDR deployment.
-
Support for tables with materialized views (executed query and results saved in a table) and views (saving the actual query – as SQL). As there is no primary key for those table types, a Space
ID must be defined.
-
There is support for direct JDBC
connection to a data source
Configuring Batch Load: Helm
Enabling
Batch load is enabled through Kubernetes orchestration
. It is not enabled by default.
The following flag has to be added to the helm command: global.batchload.enabled=true
.
Adding the Agent
For each data source created, a separate Batch Load agent must be installed. GigaSpaces also have a separate helm chart in order to install a batch load agent outside of the umbrella. This would be used for the case where a client requires more than one agent. For example, if there are multiple Oracle databases.
To install an agent under the DIH umbrella:
global.batchload-agent.enabled=true
For installing an agent and controlling its name: global.batchload-agent.agent.name=[name of agent]
.
It is also possible to install the batch load agent outside of the helm umbrella. This would be used in the case of a client needing more than one agent (for example, for multiple Oracle databases): helm install di-agent [dih repo name]/di-agents --version 2.0.0 --set agent.name=[name of agent]
Supported Data Source and Loading Types.
Currently, GigaSpaces supports the ability to perform full batch load from an Oracle DB. More data sources and loading types will be added in future releases.
Creating a Data Source for Batch Load
Batch Load cannot be configured for a pipeline that is configured and running with CDC (IIDR). To enable Batch Load the appropriate configuration must be used when creating the Data Source.
To use Batch load when creating a Pipeline, add a new Pipeline by following steps as outlined in the User Guide: SpaceDeck - Spaces - Adding a Pipeline for Batch Load
User Flows: Creating a Pipeline using Batch Load
Batch Load cannot be configured for a pipeline that is configured and running with CDC (IIDR). To enable Batch Load a new pipeline has to be created.
Oracle Database: Define Basic Full Batch Load Pipeline
-
Define Oracle as the Data Source with the connector type = BATCHLOAD
Full batch load ends after the full load is completed. The status should be Completed. This differs from a CDC pipeline.